Security Analysis of mvHash-B Similarity Hashing
نویسندگان
چکیده
In the era of big data, the volume of digital data is increasing rapidly, causing new challenges for investigators to examine the same in a reasonable amount of time. A major requirement of modern forensic investigation is the ability to perform automatic filtering of correlated data, and thereby reducing and focusing the manual effort of the investigator. Approximate matching is a technique to find “closeness” between two digital artifacts. mvHash-B is a wellknown approximate matching scheme used for finding similarity between two digital objects and produces a ‘score of similarity’ on a scale of 0 to 100; however, no security analysis of mvHash-B is available in the literature. In this work, we perform the first academic security analysis of this algorithm and show that it is possible for an attacker to “fool” it by causing the similarity score to be close to zero even when the objects are very similar. By similarity of the objects, we mean semantic similarity for text and visual match for images. The designers of mvHash-B had claimed that the scheme is secure against ‘active manipulation.’ We contest this claim in this work. We propose an algorithm that starts with a given document and produces another one of the same size without influencing its semantic and visual meaning (for text and image files, respectively) but which has low similarity score as measured by mvHash-B. In our experiments, we show that the similarity score can be reduced from 100 to less than 6 for text and image documents. We performed experiments with 50 text files and 200 images and the average similarity score between the original file and the file produced by our algorithm was found to be 4 for text files and 6 for image files. In fact, if the original file size is small then the similarity score between the two files was close to 0, almost always. To improve the security of mvHash-B against active adversaries, we propose a modification in the scheme. We show that the modification prevents the attack we describe in this work.
منابع مشابه
Image authentication using LBP-based perceptual image hashing
Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for percep...
متن کاملEnabling Privacy-Assured Similarity Retrieval over Millions of Encrypted Records
Searchable symmetric encryption (SSE) has been studied extensively for its full potential in enabling exact-match queries on encrypted records. Yet, situations for similarity queries remain to be fully explored. In this paper, we design privacy-assured similarity search schemes over millions of encrypted high-dimensional records. Our design employs locality-sensitive hashing (LSH) and SSE, wher...
متن کاملb-Bit Minwise Hashing for Estimating Three-Way Similarities
Computing1 two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b ≥ 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way...
متن کاملA Mutual Authentication Method for Internet of Things
Today, we are witnessing the expansion of various Internet of Things (IoT) applications and services such as surveillance and health. These services are delivered to users via smart devices anywhere and anytime. Forecasts show that the IoT, which is controlled online in the user environment, will reach 25 billion devices worldwide by 2020. Data security is one of the main concerns in the IoT. ...
متن کاملSimilarity Hashing : A Computer Vision Solution to theInverse Problem of Linear
The dicult task of nding a fractal representation of an input shape is called the inverse problem of fractal geometry. Previous attempts at solving this problem have applied techniques from numerical minimization, heuristic search and image compression. The most appropriate domain from which to attack this problem is not numerical analysis nor signal processing, but model-based computer vision....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JDFSL
دوره 11 شماره
صفحات -
تاریخ انتشار 2016